Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 4829 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 160.5 KiB |
| Average record size in memory | 34.0 B |
Variable types
| Numeric | 9 |
|---|---|
| Categorical | 1 |
msno has a high cardinality: 4778 distinct values | High cardinality |
num_25 is highly correlated with num_unq | High correlation |
num_50 is highly correlated with num_75 | High correlation |
num_75 is highly correlated with num_50 | High correlation |
num_100 is highly correlated with num_unq and 1 other fields | High correlation |
num_unq is highly correlated with num_25 and 2 other fields | High correlation |
total_secs is highly correlated with num_100 and 1 other fields | High correlation |
num_25 is highly correlated with num_50 and 2 other fields | High correlation |
num_50 is highly correlated with num_25 | High correlation |
num_75 is highly correlated with num_25 | High correlation |
num_100 is highly correlated with num_unq and 1 other fields | High correlation |
num_unq is highly correlated with num_25 and 2 other fields | High correlation |
total_secs is highly correlated with num_100 and 1 other fields | High correlation |
num_100 is highly correlated with num_unq and 1 other fields | High correlation |
num_unq is highly correlated with num_100 and 1 other fields | High correlation |
total_secs is highly correlated with num_100 and 1 other fields | High correlation |
num_50 is highly correlated with num_25 and 2 other fields | High correlation |
num_100 is highly correlated with num_unq | High correlation |
num_25 is highly correlated with num_50 and 1 other fields | High correlation |
num_75 is highly correlated with num_50 and 2 other fields | High correlation |
num_985 is highly correlated with num_75 | High correlation |
num_unq is highly correlated with num_50 and 3 other fields | High correlation |
num_985 is highly skewed (γ1 = 22.86078337) | Skewed |
msno is uniformly distributed | Uniform |
df_index has unique values | Unique |
num_25 has 1196 (24.8%) zeros | Zeros |
num_50 has 2235 (46.3%) zeros | Zeros |
num_75 has 2600 (53.8%) zeros | Zeros |
num_985 has 2537 (52.5%) zeros | Zeros |
num_100 has 192 (4.0%) zeros | Zeros |
Reproduction
| Analysis started | 2023-05-18 18:12:04.032248 |
|---|---|
| Analysis finished | 2023-05-18 18:12:12.234227 |
| Duration | 8.2 seconds |
| Software version | pandas-profiling v3.0.0 |
| Download configuration | config.json |
| Distinct | 4829 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2435742.384 |
| Minimum | 517 |
|---|---|
| Maximum | 4828877 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 37.9 KiB |
Quantile statistics
| Minimum | 517 |
|---|---|
| 5-th percentile | 270323.6 |
| Q1 | 1235466 |
| median | 2449426 |
| Q3 | 3630864 |
| 95-th percentile | 4595879 |
| Maximum | 4828877 |
| Range | 4828360 |
| Interquartile range (IQR) | 2395398 |
Descriptive statistics
| Standard deviation | 1382347.658 |
|---|---|
| Coefficient of variation (CV) | 0.5675262159 |
| Kurtosis | -1.180608149 |
| Mean | 2435742.384 |
| Median Absolute Deviation (MAD) | 1195565 |
| Skewness | -0.01513463332 |
| Sum | 1.176219997 × 1010 |
| Variance | 1.910885049 × 1012 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 47813 | 1 | < 0.1% |
| 1354163 | 1 | < 0.1% |
| 2137050 | 1 | < 0.1% |
| 3101594 | 1 | < 0.1% |
| 1832855 | 1 | < 0.1% |
| 1372004 | 1 | < 0.1% |
| 396521 | 1 | < 0.1% |
| 1660059 | 1 | < 0.1% |
| 1904282 | 1 | < 0.1% |
| 2842461 | 1 | < 0.1% |
| Other values (4819) | 4819 |
| Value | Count | Frequency (%) |
| 517 | 1 | |
| 1057 | 1 | |
| 5618 | 1 | |
| 5888 | 1 | |
| 7546 | 1 | |
| 9493 | 1 | |
| 10404 | 1 | |
| 12855 | 1 | |
| 13079 | 1 | |
| 13277 | 1 |
| Value | Count | Frequency (%) |
| 4828877 | 1 | |
| 4825930 | 1 | |
| 4824489 | 1 | |
| 4824330 | 1 | |
| 4824260 | 1 | |
| 4823057 | 1 | |
| 4822238 | 1 | |
| 4822048 | 1 | |
| 4820090 | 1 | |
| 4819412 | 1 |
| Distinct | 4778 |
|---|---|
| Distinct (%) | 98.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 37.9 KiB |
| xCajYQxbS49eajWxKzqz334LwCJ9KfxrJd7UxNYhaIY= | 2 |
|---|---|
| svBSLpPM7bavZWC9PqWaFH2ggaLUB3v+hxuYaimR4bE= | 2 |
| ybcB3cPPYvNQYPPtXfwr7Kn2mKjTB8A76yr0xdgvpEE= | 2 |
| 6unhde+Y+GQ3rF9ycEcUkCfaZ7KmjZPgH6rpklp4c/c= | 2 |
| qarZbGz3cWOBvfn6E0FdUFjD61PNamg2XHThFPBC170= | 2 |
| Other values (4773) |
Length
| Max length | 44 |
|---|---|
| Median length | 44 |
| Mean length | 44 |
| Min length | 44 |
Characters and Unicode
| Total characters | 212476 |
|---|---|
| Distinct characters | 65 |
| Distinct categories | 5 ? |
| Distinct scripts | 2 ? |
| Distinct blocks | 1 ? |
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.
Unique
| Unique | 4727 ? |
|---|---|
| Unique (%) | 97.9% |
Sample
| 1st row | ZMqqE8u7R/12bWzAViCUlfyEZHrcw80XEsrQHU+dZUg= |
|---|---|
| 2nd row | MDBdrLTDMSW6N5HBrQ5zMINuP+40NtsNIaczfyZk//M= |
| 3rd row | nXhDC780WU3nOGfeCjSmvHu2LtVD3MpJQjGljN7lxvg= |
| 4th row | K5tYghuu/DYjBZZaH+hfT3hOo+Oq/SSQ08A7+EgjWq4= |
| 5th row | AwebbTh1Flk5wlsSVzpqSt3b0CSdqemjXv53zdhiqWc= |
Common Values
| Value | Count | Frequency (%) |
| xCajYQxbS49eajWxKzqz334LwCJ9KfxrJd7UxNYhaIY= | 2 | < 0.1% |
| svBSLpPM7bavZWC9PqWaFH2ggaLUB3v+hxuYaimR4bE= | 2 | < 0.1% |
| ybcB3cPPYvNQYPPtXfwr7Kn2mKjTB8A76yr0xdgvpEE= | 2 | < 0.1% |
| 6unhde+Y+GQ3rF9ycEcUkCfaZ7KmjZPgH6rpklp4c/c= | 2 | < 0.1% |
| qarZbGz3cWOBvfn6E0FdUFjD61PNamg2XHThFPBC170= | 2 | < 0.1% |
| fidtcGs0Se8+L+vu5U5xn0CO49IF2JhUOkZGoSdY80M= | 2 | < 0.1% |
| 6V3gtsS1+lLYYpOSHQqIReGrpMoerNiA2SUmNEcg5M8= | 2 | < 0.1% |
| 0bNov9ubohMPwSlfi4SDwXG7hsNlxowIo9uZFZwBzj4= | 2 | < 0.1% |
| MNYQViUgJh/ZnE256u+Klgu0UrEF/367nEDidkPYUfA= | 2 | < 0.1% |
| pf9sgISBlORc9yqsD9+zU00MHf6Tw95EqciYtgm6YUQ= | 2 | < 0.1% |
| Other values (4768) | 4809 |
Length
Histogram of lengths of the category
| Value | Count | Frequency (%) |
| xcajyqxbs49eajwxkzqz334lwcj9kfxrjd7uxnyhaiy | 2 | < 0.1% |
| svbslppm7bavzwc9pqwafh2ggalub3v+hxuyaimr4be | 2 | < 0.1% |
| ybcb3cppyvnqypptxfwr7kn2mkjtb8a76yr0xdgvpee | 2 | < 0.1% |
| 6unhde+y+gq3rf9ycecukcfaz7kmjzpgh6rpklp4c/c | 2 | < 0.1% |
| qarzbgz3cwobvfn6e0fdufjd61pnamg2xhthfpbc170 | 2 | < 0.1% |
| fidtcgs0se8+l+vu5u5xn0co49if2jhuokzgosdy80m | 2 | < 0.1% |
| 6v3gtss1+llyyposhqqiregrpmoernia2sumnecg5m8 | 2 | < 0.1% |
| 0bnov9ubohmpwslfi4sdwxg7hsnlxowio9uzfzwbzj4 | 2 | < 0.1% |
| mnyqviugjh/zne256u+klgu0uref/367nedidkpyufa | 2 | < 0.1% |
| pf9sgisblorc9yqsd9+zu00mhf6tw95eqciytgm6yuq | 2 | < 0.1% |
| Other values (4768) | 4809 |
Most occurring characters
| Value | Count | Frequency (%) |
| = | 4829 | 2.3% |
| 0 | 3604 | 1.7% |
| U | 3584 | 1.7% |
| Y | 3570 | 1.7% |
| E | 3543 | 1.7% |
| M | 3528 | 1.7% |
| 8 | 3484 | 1.6% |
| A | 3475 | 1.6% |
| s | 3475 | 1.6% |
| c | 3456 | 1.6% |
| Other values (55) | 175928 |
Most occurring categories
| Value | Count | Frequency (%) |
| Uppercase Letter | 84878 | |
| Lowercase Letter | 84246 | |
| Decimal Number | 32356 | 15.2% |
| Math Symbol | 7901 | 3.7% |
| Other Punctuation | 3095 | 1.5% |
Most frequent character per category
Uppercase Letter
| Value | Count | Frequency (%) |
| U | 3584 | 4.2% |
| Y | 3570 | 4.2% |
| E | 3543 | 4.2% |
| M | 3528 | 4.2% |
| A | 3475 | 4.1% |
| I | 3432 | 4.0% |
| Q | 3401 | 4.0% |
| H | 3286 | 3.9% |
| B | 3245 | 3.8% |
| O | 3237 | 3.8% |
| Other values (16) | 50577 |
Lowercase Letter
| Value | Count | Frequency (%) |
| s | 3475 | 4.1% |
| c | 3456 | 4.1% |
| w | 3452 | 4.1% |
| g | 3449 | 4.1% |
| o | 3444 | 4.1% |
| k | 3405 | 4.0% |
| y | 3281 | 3.9% |
| r | 3256 | 3.9% |
| m | 3253 | 3.9% |
| i | 3240 | 3.8% |
| Other values (16) | 50535 |
Decimal Number
| Value | Count | Frequency (%) |
| 0 | 3604 | |
| 8 | 3484 | |
| 4 | 3422 | |
| 2 | 3192 | |
| 3 | 3187 | |
| 6 | 3161 | |
| 1 | 3130 | |
| 9 | 3066 | |
| 7 | 3062 | |
| 5 | 3048 |
Math Symbol
| Value | Count | Frequency (%) |
| = | 4829 | |
| + | 3072 |
Other Punctuation
| Value | Count | Frequency (%) |
| / | 3095 |
Most occurring scripts
| Value | Count | Frequency (%) |
| Latin | 169124 | |
| Common | 43352 | 20.4% |
Most frequent character per script
Latin
| Value | Count | Frequency (%) |
| U | 3584 | 2.1% |
| Y | 3570 | 2.1% |
| E | 3543 | 2.1% |
| M | 3528 | 2.1% |
| A | 3475 | 2.1% |
| s | 3475 | 2.1% |
| c | 3456 | 2.0% |
| w | 3452 | 2.0% |
| g | 3449 | 2.0% |
| o | 3444 | 2.0% |
| Other values (42) | 134148 |
Common
| Value | Count | Frequency (%) |
| = | 4829 | |
| 0 | 3604 | 8.3% |
| 8 | 3484 | 8.0% |
| 4 | 3422 | 7.9% |
| 2 | 3192 | 7.4% |
| 3 | 3187 | 7.4% |
| 6 | 3161 | 7.3% |
| 1 | 3130 | 7.2% |
| / | 3095 | 7.1% |
| + | 3072 | 7.1% |
| Other values (3) | 9176 |
Most occurring blocks
| Value | Count | Frequency (%) |
| ASCII | 212476 |
Most frequent character per block
ASCII
| Value | Count | Frequency (%) |
| = | 4829 | 2.3% |
| 0 | 3604 | 1.7% |
| U | 3584 | 1.7% |
| Y | 3570 | 1.7% |
| E | 3543 | 1.7% |
| M | 3528 | 1.7% |
| 8 | 3484 | 1.6% |
| A | 3475 | 1.6% |
| s | 3475 | 1.6% |
| c | 3456 | 1.6% |
| Other values (55) | 175928 |
date
Real number (ℝ≥0)
| Distinct | 679 |
|---|---|
| Distinct (%) | 14.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 20160819.16 |
| Minimum | 20150126 |
|---|---|
| Maximum | 20170228 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 19.0 KiB |
Quantile statistics
| Minimum | 20150126 |
|---|---|
| 5-th percentile | 20150828 |
| Q1 | 20160321 |
| median | 20160810 |
| Q3 | 20161127 |
| 95-th percentile | 20170211 |
| Maximum | 20170228 |
| Range | 20102 |
| Interquartile range (IQR) | 806 |
Descriptive statistics
| Standard deviation | 5371.156213 |
|---|---|
| Coefficient of variation (CV) | 0.0002664155742 |
| Kurtosis | 0.2159113996 |
| Mean | 20160819.16 |
| Median Absolute Deviation (MAD) | 399 |
| Skewness | -0.0781400601 |
| Sum | 9.735659574 × 1010 |
| Variance | 28849319.07 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 20170122 | 25 | 0.5% |
| 20170223 | 23 | 0.5% |
| 20170104 | 21 | 0.4% |
| 20170228 | 20 | 0.4% |
| 20170113 | 20 | 0.4% |
| 20160923 | 20 | 0.4% |
| 20161225 | 20 | 0.4% |
| 20170219 | 19 | 0.4% |
| 20160727 | 19 | 0.4% |
| 20170224 | 19 | 0.4% |
| Other values (669) | 4623 |
| Value | Count | Frequency (%) |
| 20150126 | 1 | < 0.1% |
| 20150128 | 1 | < 0.1% |
| 20150129 | 1 | < 0.1% |
| 20150131 | 1 | < 0.1% |
| 20150207 | 2 | |
| 20150209 | 2 | |
| 20150210 | 1 | < 0.1% |
| 20150212 | 1 | < 0.1% |
| 20150214 | 2 | |
| 20150216 | 4 |
| Value | Count | Frequency (%) |
| 20170228 | 20 | |
| 20170227 | 11 | |
| 20170226 | 14 | |
| 20170225 | 10 | |
| 20170224 | 19 | |
| 20170223 | 23 | |
| 20170222 | 13 | |
| 20170221 | 17 | |
| 20170220 | 12 | |
| 20170219 | 19 |
| Distinct | 99 |
|---|---|
| Distinct (%) | 2.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 6.998964589 |
| Minimum | 0 |
|---|---|
| Maximum | 413 |
| Zeros | 1196 |
| Zeros (%) | 24.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 2 |
| Q3 | 7 |
| 95-th percentile | 27 |
| Maximum | 413 |
| Range | 413 |
| Interquartile range (IQR) | 6 |
Descriptive statistics
| Standard deviation | 15.93768282 |
|---|---|
| Coefficient of variation (CV) | 2.277148658 |
| Kurtosis | 181.012801 |
| Mean | 6.998964589 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 10.18108207 |
| Sum | 33798 |
| Variance | 254.0097338 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 1196 | |
| 1 | 756 | |
| 2 | 494 | |
| 3 | 374 | 7.7% |
| 4 | 266 | 5.5% |
| 5 | 239 | 4.9% |
| 6 | 157 | 3.3% |
| 7 | 148 | 3.1% |
| 8 | 122 | 2.5% |
| 9 | 107 | 2.2% |
| Other values (89) | 970 |
| Value | Count | Frequency (%) |
| 0 | 1196 | |
| 1 | 756 | |
| 2 | 494 | |
| 3 | 374 | 7.7% |
| 4 | 266 | 5.5% |
| 5 | 239 | 4.9% |
| 6 | 157 | 3.3% |
| 7 | 148 | 3.1% |
| 8 | 122 | 2.5% |
| 9 | 107 | 2.2% |
| Value | Count | Frequency (%) |
| 413 | 1 | |
| 330 | 1 | |
| 322 | 1 | |
| 239 | 1 | |
| 222 | 1 | |
| 166 | 1 | |
| 164 | 1 | |
| 141 | 1 | |
| 134 | 1 | |
| 133 | 1 |
| Distinct | 47 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.764340443 |
| Minimum | 0 |
|---|---|
| Maximum | 122 |
| Zeros | 2235 |
| Zeros (%) | 46.3% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 1 |
| Q3 | 2 |
| 95-th percentile | 7 |
| Maximum | 122 |
| Range | 122 |
| Interquartile range (IQR) | 2 |
Descriptive statistics
| Standard deviation | 4.706018186 |
|---|---|
| Coefficient of variation (CV) | 2.667295989 |
| Kurtosis | 174.5560379 |
| Mean | 1.764340443 |
| Median Absolute Deviation (MAD) | 1 |
| Skewness | 10.50907629 |
| Sum | 8520 |
| Variance | 22.14660717 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=47)
| Value | Count | Frequency (%) |
| 0 | 2235 | |
| 1 | 1146 | |
| 2 | 559 | 11.6% |
| 3 | 311 | 6.4% |
| 4 | 170 | 3.5% |
| 5 | 101 | 2.1% |
| 6 | 63 | 1.3% |
| 7 | 53 | 1.1% |
| 8 | 37 | 0.8% |
| 10 | 24 | 0.5% |
| Other values (37) | 130 | 2.7% |
| Value | Count | Frequency (%) |
| 0 | 2235 | |
| 1 | 1146 | |
| 2 | 559 | 11.6% |
| 3 | 311 | 6.4% |
| 4 | 170 | 3.5% |
| 5 | 101 | 2.1% |
| 6 | 63 | 1.3% |
| 7 | 53 | 1.1% |
| 8 | 37 | 0.8% |
| 9 | 18 | 0.4% |
| Value | Count | Frequency (%) |
| 122 | 1 | |
| 98 | 1 | |
| 72 | 1 | |
| 65 | 1 | |
| 60 | 1 | |
| 57 | 1 | |
| 56 | 1 | |
| 55 | 2 | |
| 47 | 1 | |
| 46 | 1 |
| Distinct | 22 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.034789812 |
| Minimum | 0 |
|---|---|
| Maximum | 68 |
| Zeros | 2600 |
| Zeros (%) | 53.8% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 4 |
| Maximum | 68 |
| Range | 68 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 2.085583694 |
|---|---|
| Coefficient of variation (CV) | 2.015466011 |
| Kurtosis | 243.032609 |
| Mean | 1.034789812 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 10.04576581 |
| Sum | 4997 |
| Variance | 4.349659344 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=22)
| Value | Count | Frequency (%) |
| 0 | 2600 | |
| 1 | 1157 | |
| 2 | 486 | 10.1% |
| 3 | 247 | 5.1% |
| 4 | 125 | 2.6% |
| 5 | 79 | 1.6% |
| 6 | 48 | 1.0% |
| 7 | 26 | 0.5% |
| 9 | 14 | 0.3% |
| 8 | 11 | 0.2% |
| Other values (12) | 36 | 0.7% |
| Value | Count | Frequency (%) |
| 0 | 2600 | |
| 1 | 1157 | |
| 2 | 486 | 10.1% |
| 3 | 247 | 5.1% |
| 4 | 125 | 2.6% |
| 5 | 79 | 1.6% |
| 6 | 48 | 1.0% |
| 7 | 26 | 0.5% |
| 8 | 11 | 0.2% |
| 9 | 14 | 0.3% |
| Value | Count | Frequency (%) |
| 68 | 1 | < 0.1% |
| 30 | 1 | < 0.1% |
| 24 | 1 | < 0.1% |
| 23 | 2 | < 0.1% |
| 21 | 1 | < 0.1% |
| 18 | 2 | < 0.1% |
| 15 | 2 | < 0.1% |
| 14 | 2 | < 0.1% |
| 13 | 1 | < 0.1% |
| 12 | 6 |
| Distinct | 26 |
|---|---|
| Distinct (%) | 0.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1.137709671 |
| Minimum | 0 |
|---|---|
| Maximum | 135 |
| Zeros | 2537 |
| Zeros (%) | 52.5% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 4 |
| Maximum | 135 |
| Range | 135 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 3.237794079 |
|---|---|
| Coefficient of variation (CV) | 2.845887806 |
| Kurtosis | 803.4933634 |
| Mean | 1.137709671 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 22.86078337 |
| Sum | 5494 |
| Variance | 10.48331049 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=26)
| Value | Count | Frequency (%) |
| 0 | 2537 | |
| 1 | 1160 | |
| 2 | 523 | 10.8% |
| 3 | 239 | 4.9% |
| 4 | 151 | 3.1% |
| 5 | 73 | 1.5% |
| 6 | 49 | 1.0% |
| 7 | 25 | 0.5% |
| 9 | 16 | 0.3% |
| 8 | 16 | 0.3% |
| Other values (16) | 40 | 0.8% |
| Value | Count | Frequency (%) |
| 0 | 2537 | |
| 1 | 1160 | |
| 2 | 523 | 10.8% |
| 3 | 239 | 4.9% |
| 4 | 151 | 3.1% |
| 5 | 73 | 1.5% |
| 6 | 49 | 1.0% |
| 7 | 25 | 0.5% |
| 8 | 16 | 0.3% |
| 9 | 16 | 0.3% |
| Value | Count | Frequency (%) |
| 135 | 1 | < 0.1% |
| 99 | 1 | < 0.1% |
| 48 | 2 | |
| 42 | 1 | < 0.1% |
| 28 | 1 | < 0.1% |
| 26 | 1 | < 0.1% |
| 24 | 1 | < 0.1% |
| 19 | 1 | < 0.1% |
| 17 | 2 | |
| 16 | 3 |
| Distinct | 203 |
|---|---|
| Distinct (%) | 4.2% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 28.31807828 |
| Minimum | 0 |
|---|---|
| Maximum | 393 |
| Zeros | 192 |
| Zeros (%) | 4.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 6 |
| median | 16 |
| Q3 | 35 |
| 95-th percentile | 107 |
| Maximum | 393 |
| Range | 393 |
| Interquartile range (IQR) | 29 |
Descriptive statistics
| Standard deviation | 36.28610944 |
|---|---|
| Coefficient of variation (CV) | 1.281376126 |
| Kurtosis | 9.957715617 |
| Mean | 28.31807828 |
| Median Absolute Deviation (MAD) | 12 |
| Skewness | 2.681760175 |
| Sum | 136748 |
| Variance | 1316.681738 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 239 | 4.9% |
| 4 | 197 | 4.1% |
| 0 | 192 | 4.0% |
| 2 | 187 | 3.9% |
| 3 | 173 | 3.6% |
| 5 | 172 | 3.6% |
| 8 | 157 | 3.3% |
| 7 | 147 | 3.0% |
| 6 | 131 | 2.7% |
| 10 | 127 | 2.6% |
| Other values (193) | 3107 |
| Value | Count | Frequency (%) |
| 0 | 192 | |
| 1 | 239 | |
| 2 | 187 | |
| 3 | 173 | |
| 4 | 197 | |
| 5 | 172 | |
| 6 | 131 | |
| 7 | 147 | |
| 8 | 157 | |
| 9 | 121 |
| Value | Count | Frequency (%) |
| 393 | 1 | |
| 318 | 1 | |
| 312 | 1 | |
| 274 | 1 | |
| 259 | 1 | |
| 252 | 1 | |
| 246 | 1 | |
| 245 | 1 | |
| 240 | 2 | |
| 228 | 1 |
| Distinct | 188 |
|---|---|
| Distinct (%) | 3.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 29.0581901 |
| Minimum | 1 |
|---|---|
| Maximum | 436 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 2 |
| Q1 | 8 |
| median | 18 |
| Q3 | 38 |
| 95-th percentile | 95 |
| Maximum | 436 |
| Range | 435 |
| Interquartile range (IQR) | 30 |
Descriptive statistics
| Standard deviation | 32.72668902 |
|---|---|
| Coefficient of variation (CV) | 1.126246642 |
| Kurtosis | 12.49604329 |
| Mean | 29.0581901 |
| Median Absolute Deviation (MAD) | 13 |
| Skewness | 2.683394964 |
| Sum | 140322 |
| Variance | 1071.036174 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 205 | 4.2% |
| 4 | 180 | 3.7% |
| 3 | 174 | 3.6% |
| 2 | 172 | 3.6% |
| 5 | 172 | 3.6% |
| 8 | 159 | 3.3% |
| 6 | 134 | 2.8% |
| 11 | 134 | 2.8% |
| 7 | 127 | 2.6% |
| 12 | 120 | 2.5% |
| Other values (178) | 3252 |
| Value | Count | Frequency (%) |
| 1 | 205 | |
| 2 | 172 | |
| 3 | 174 | |
| 4 | 180 | |
| 5 | 172 | |
| 6 | 134 | |
| 7 | 127 | |
| 8 | 159 | |
| 9 | 119 | |
| 10 | 114 |
| Value | Count | Frequency (%) |
| 436 | 1 | |
| 312 | 1 | |
| 295 | 1 | |
| 280 | 1 | |
| 243 | 1 | |
| 220 | 1 | |
| 218 | 1 | |
| 213 | 1 | |
| 208 | 1 | |
| 206 | 1 |
| Distinct | 3405 |
|---|---|
| Distinct (%) | 70.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 7 |
| Infinite (%) | 0.1% |
| Mean | nan |
| Minimum | -inf |
|---|---|
| Maximum | inf |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 1 |
| Negative (%) | < 0.1% |
| Memory size | 9.6 KiB |
Quantile statistics
| Minimum | -inf |
|---|---|
| 5-th percentile | 303.25 |
| Q1 | 1805 |
| median | 4408 |
| Q3 | 9552 |
| 95-th percentile | 27369.6 |
| Maximum | inf |
| Range | inf |
| Interquartile range (IQR) | 7747 |
Descriptive statistics
| Standard deviation | nan |
|---|---|
| Coefficient of variation (CV) | nan |
| Kurtosis | nan |
| Mean | nan |
| Median Absolute Deviation (MAD) | 3170 |
| Skewness | nan |
| Sum | nan |
| Variance | nan |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| inf | 6 | 0.1% |
| 2072 | 6 | 0.1% |
| 4968 | 5 | 0.1% |
| 10072 | 5 | 0.1% |
| 4136 | 5 | 0.1% |
| 2390 | 5 | 0.1% |
| 4916 | 5 | 0.1% |
| 14032 | 5 | 0.1% |
| 10232 | 5 | 0.1% |
| 6544 | 5 | 0.1% |
| Other values (3395) | 4777 |
| Value | Count | Frequency (%) |
| -inf | 1 | |
| 1.801757812 | 1 | |
| 2.533203125 | 1 | |
| 2.89453125 | 1 | |
| 4.01171875 | 1 | |
| 4.45703125 | 1 | |
| 4.96484375 | 1 | |
| 6.94921875 | 1 | |
| 7.17578125 | 1 | |
| 9.4609375 | 1 |
| Value | Count | Frequency (%) |
| inf | 6 | |
| 61888 | 1 | < 0.1% |
| 60896 | 1 | < 0.1% |
| 60544 | 1 | < 0.1% |
| 56160 | 1 | < 0.1% |
| 55776 | 1 | < 0.1% |
| 54848 | 1 | < 0.1% |
| 53952 | 1 | < 0.1% |
| 53824 | 1 | < 0.1% |
| 52416 | 1 | < 0.1% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| df_index | msno | date | num_25 | num_50 | num_75 | num_985 | num_100 | num_unq | total_secs | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 47813 | ZMqqE8u7R/12bWzAViCUlfyEZHrcw80XEsrQHU+dZUg= | 20160923 | 2 | 3 | 1 | 1 | 20 | 26 | 5612.0 |
| 1 | 610745 | MDBdrLTDMSW6N5HBrQ5zMINuP+40NtsNIaczfyZk//M= | 20161216 | 0 | 0 | 1 | 0 | 7 | 2 | 1565.0 |
| 2 | 662811 | nXhDC780WU3nOGfeCjSmvHu2LtVD3MpJQjGljN7lxvg= | 20160209 | 14 | 1 | 0 | 0 | 7 | 20 | 1997.0 |
| 3 | 801517 | K5tYghuu/DYjBZZaH+hfT3hOo+Oq/SSQ08A7+EgjWq4= | 20160531 | 2 | 0 | 1 | 0 | 4 | 6 | 1032.0 |
| 4 | 2001312 | AwebbTh1Flk5wlsSVzpqSt3b0CSdqemjXv53zdhiqWc= | 20160919 | 3 | 0 | 0 | 0 | 13 | 16 | 2944.0 |
| 5 | 4807528 | cpj7Ozph04uqd82G4+y7mlIizoMRAJVq+7Dr3g4Tiak= | 20170110 | 5 | 1 | 0 | 1 | 2 | 6 | 1185.0 |
| 6 | 162521 | /LQQ2BhoSMYQwfMzhfqExl1QO/wVpyEEaM4VKE69lBw= | 20160608 | 3 | 1 | 0 | 1 | 22 | 27 | 5884.0 |
| 7 | 2602174 | m/cb+oHs01UyIl7NKsz5UODg3LdP4OPiYHOOCE8aJzw= | 20160724 | 71 | 2 | 2 | 11 | 40 | 89 | 13208.0 |
| 8 | 4572800 | kFxaXeLUTCaDcfrFYsswCcRDzcJ4yv4G0GIOi9+8XQk= | 20160815 | 44 | 1 | 0 | 0 | 3 | 40 | 1528.0 |
| 9 | 4392681 | izbJFXkVseLFSo88hYbeNvzebivlVb3O2eZmHonJ6Ds= | 20170212 | 1 | 2 | 2 | 1 | 84 | 61 | 20800.0 |
Last rows
| df_index | msno | date | num_25 | num_50 | num_75 | num_985 | num_100 | num_unq | total_secs | |
|---|---|---|---|---|---|---|---|---|---|---|
| 4819 | 730038 | L6GXAIW/sG8eTsJRRSjMjWstldMzztJ4mdY5C1sx38s= | 20150928 | 7 | 0 | 1 | 1 | 113 | 33 | 29184.0 |
| 4820 | 3176937 | fs5aPuroxK1X0EIPp1Y+lIDxXSb40pTI6isxjhg3WKw= | 20170204 | 6 | 0 | 0 | 1 | 3 | 5 | 713.5 |
| 4821 | 586878 | IPzfhGbM+ze2M67t+uW9Cy77NbSWTPJSriMxJO4JD4k= | 20160105 | 0 | 0 | 0 | 0 | 28 | 28 | 6816.0 |
| 4822 | 1454964 | X7W92ku30OKsXA/z+318ZVtv5VZO2SINV1pIMr6IJOc= | 20160113 | 17 | 24 | 1 | 2 | 7 | 36 | 4284.0 |
| 4823 | 2919035 | eClquUEzNz8F+JdCbfEs9/aCw0wuhyoWQT3GMqMF6Nc= | 20160403 | 5 | 1 | 0 | 0 | 5 | 11 | 1220.0 |
| 4824 | 342522 | K910kzvQO5IkkCo4cesBF2I0FHSxsvpcbZICrmevxEU= | 20151205 | 16 | 2 | 0 | 0 | 51 | 64 | 13688.0 |
| 4825 | 2533079 | iGfPwHz9KLibjU9pGGvR0quT887+b0OyBAVX6sw7CyY= | 20161002 | 44 | 4 | 2 | 2 | 15 | 58 | 4804.0 |
| 4826 | 4161858 | uext6x4kOlP/we/gaAu4q4dLPSPHo5ODHkD8Hsw1xZA= | 20150804 | 10 | 0 | 1 | 0 | 22 | 27 | 5732.0 |
| 4827 | 1359366 | /Ib3hKp0WET3C7tCjv4mFKEV2i1EacuTcs0nocTosko= | 20151028 | 1 | 2 | 0 | 2 | 2 | 7 | 1043.0 |
| 4828 | 1266841 | 4QRW4r1vzS9pXyY15vJjJMQw9OlHJTv21VBRviHnd/o= | 20160212 | 42 | 4 | 2 | 3 | 60 | 82 | 16784.0 |